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ABSTRACT 


A  theoretical  model  of  how  an  expert  programmer  goes 
about  understanding  a  piece  of  software  Is  presented.  This 
understanding  plays  an  especially  critical  role  In  software 
rralntenance  tasks.  The  ircdei  Is  based  on  three  cognitive 
processes:  CHUNKING,  SLICING,  and  HYPOTHESIS  GENERATION  and 
VERIFICATION.  These  processes  are  used  in  conjunction  with 
a  programmer 's  Knowledge  base  and  categories  of  information 
critical  to  program  understanding  are  identified.  The  model 
also  taKes  advantage  of  certain  characteristics  of  an 
associative  memory  to  describe,  using  a  semantic  net 
representation,  the  mechanisms  behind  these  processes  and 
the  organization  of  memory  resulting  from  their  use.  The 
benefits  of  documentation  and  the  use  of  commenting  and 
mnemonics  are  described  in  terms  of  the  model  and  may  be 
useful  as  a  guide  for  Incorporating  these  Into  the  code. 
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I.  INTRODUCTION 


Software  maintenance  now  accounts  lor  a  large  percentage 
cf  any  software  system's  ilfe-cycle  cost.  In  view  of  this, 
the  software  industry  has  shifted  its  emphasis  with  respect 
tc  program  evaluation.  No  longer  is  software  feeing  judged 
solely  on  the  merits  of  its  applicability  to  a  given 
problem.  fchlle  net  neglecting  the  importance  of  this,  the 
industry  is  considering  factors  which  affect  software 
maintenance  as  well.  One  such  factor  is  software 
understandatliity  [Ref.  1]. 

Gaining  an  understanding  of  unfamiliar  programs  is 
frequently  cited  ty  researchers  as  the  first  and  often  most 
costly  step  in  software  maintenance.  This  understanding  is 
achieved  when  the  programmer  has  'learned'  all  that  is 
necessary  tc  competently  carry  out  the  required  maintenance 
task.  f'aklng  software  easier  to  understand  would  have 
significant  long  term  advantages  resulting  in  reduced  life- 
cycle  costs.  This  study  presents  a  theoretical  model  of 
cognitive  processes,  based  on  observed  programmer  behavior, 
which  aids  in  acquiring  this  understanding.  Further,  the 
study  contends  that  the  effectiveness  of  these  processes  is 
dependent  upon  the  extent  of  the  programmer's  knowledge 


Most  cognitive  research  analysing  programmer  behavior 
supports  the  laea  of  levels  of  skill  or  ability,  and 
categorizes  programmers  as  either  novice,  experienced,  or 
expert.  Based  on  the  proposed  theoretical  model,  this 
ability  is  detined  by  how  well  the  processes  are  developed 
by  the  programmer,  and  the  extent  of  his  or  her  knowledge 
base. 

A  novice  has  a  relatively  limited  knowledge  base. 
Consequently,  there  is  very  little  development  of  the 
cognitive  processes  in  evidence.  He  or  she  is  considered 
primarily  a  learner,  using  mainly  unsophisticated 
techniques,  such  as  inductive  reasoning,  to  gain  an 
understanding  of  a  program. 

An  experienced  programmer  has  a  fairly  extensive 
knowledge  base.  It  includes  information  about  most  of  the 
knowledge  domains  necessary  for  program  understanding.  The 
depth  of  information  in  these  domains  is,  however,  uneven. 
Ey  this  it  is  meant  that  an  experienced  programmer  may  know 
algorithms  to  perform  a  certain  function,  for  example  to 
sort  numbers,  but  may  find  it  difficult  to  adapt  one  of 
these  tc  sort  words.  Or,  in  the  category  of  programming 
languages,  he  or  she  may  be  familiar  with  the  syntax  and 
semantics,  but  unsure  of  the  underlying  design  and  its 
effects  on  a  program. 
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Although  still  learning,  the  primary  emphasis  at  this 
stage  of  a  programmer 's  growth  is  the  development  of 
cognitive  processes  which  make  efficient  use  of  this 
knowledge.  At  this  stage,  the  programmer's  performance  is 
gccd,  though  inconsistent,  over  a  spectrum  of  less  difficult 
tasks.  It  does,  however,  degrade  rapidly  as  task  difficulty 
increases,  indicative  of  only  partially  developed  processes 
and  the  uneven  knowledge  base. 

An  expert,  on  the  other  band,  has  acquired  a  broad 
knowledge  base,  including  many  specifics  about  programming 
latguages  and  design,  algorithms  and  data  structures,  task 
domains,  etc.,  as  well  as  how  they  relate  to  one  another. 
He  or  she  has  a  consistently  high  level  of  performance  as 
well,  proportional  to  task  difficulty.  This  results  from  a 
demonstrated  use  of  well  developed  cognitive  processes. 

These  processes,  which  make  use  of  the  knowledge  base, 
in  conjunction  with  external  information  (program  text, 
documentation,  problem  specifications,  etc.),  enhance  the 
expert's  ability  to  gain  an  in-depth  understanding  of  the 
software  Involved  in  a  given  maintenance  task.  It  is  this 
demonstrated  capability  that  distinguishes  the  expert  from 
either  a  novice  or  experienced  programmer. 

Acknowledging  this,  the  choice  for  this  study  is  to 
model  an  expert  involved  in  the  task  of  understanding  an 
unfamiliar  program  in  order  to  perform  some  type  of 
maintenance.  What  these  processes  are,  how  they  are  used, 


and  what  information  is  contained  in  the  Knowledge  base, 
forrr  the  major  portions  of  this  rrodel.  Realizing  the 
subjective  nature  cf  the  study,  it  is  not  a  claim  that  this 
Is  a  definitive  model.  It  is,  however,  reasonable  and 
representative  of  programmer  behavior  demonstrated  by 
experts.  In  fact,  this  study  contends  that  it  is  this  very 
behavior  of  making  efficient  use  of  these  processes  which 
determines  expertise  in  this  area. 


II.  MEMORY  ana  RECALL 


We  Know  empirically  that  information  is  remembered — 
stored  in  the  Crain — and  can  be  recalled.  Most  evidence 
also  supports  the  hypothesis  that  human  memory  is  at  least 
partly  associative  [Ref.  2] .  By  this  it  Is  meant  that 
facts,  events,  concepts,  and  other  types  of  information  are 
encoded  and  stored  in  memory  as  separate  elements  or  sets  of 
elements,  connected  to  one  another  cy  means  of  association. 
Each  element  is  stored  only  once,  but  can  have  any  number  of 
associations  with  other  elements.  Each  element  is  also 
directly  accessible.  One  method  of  Knowledge  representation 
which  incorporates  many  of  the  concepts  and  properties 
associated  with  this  type  of  memory  is  the  semantic  net. 

As  there  is  no  evidence  that  strongly  supports  any 
theory  yet  proposed  to  explain  how  memory  and  recall  are 
accomplished ,  it  should  be  noted  that  the  model  proposed 
here  uses  semantic  nets  only  as  a  tool.  The  ideas  of 
semantic  nets  will  aid  in  explaining  certain  cognitive 
processes.  However,  the  model  itself  has  teen  developed 
based  on  research  data  and  its  validity  is  Independent  of 
this  or  any  other  theory  regarding  how  these  rudimentary 
cerebral  functions,  memory  and  recall,  are  accomplished. 

Memory  is  commonly  thought  of  as  having  two  parts  or 
areas.  These  are  labeled  long  Term  Memory  and  Short  Term 
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III.  KNOWIEEGE  EASE 


Experts  and  novices  differ  In  their  abilities  to  process 
large  a  iroirts  of  rreaningful  inf  orrratlon .  . .  .A  common 
explanation  of  this  difference  is  that  experts  have  not 
only  more  information,  they  have  the  information  better 
organized. . .p aking  their  perception fmore  efficient  and 
their  recall  performance  much  higher."  [Ref.  10] 

The  above  quote  emphasizes  the  importance  of  both  the 
contents  and  the  organization  of  the  knowledge  base. 
Included  in  the  discussion  presented  here  Is  the  conviction 
that  the  contents  of  memory  somehow  affect  this 
organization.  Also,  based  on  date  from  several  studies 
referenced,  this  organization  Is  dynamic  and  dependent  on 
context . 

A.  CONTENTS 

Along  with  basic  knowledge,  normally  acquired  through 
grade  school  and  college,  the  eipert  programmer  knows  a 
greet  deal  about  five  major  categories  of  knowledge 
associated  with  programming.  These  are: 

-  ALGOR ITHMS 

-  PROGRAMING  LANGUAGES 

-  LOGIC 

-  DATA  STRUCTURES 

-  PROGRAMING  DESIGN  METHODOLOGIES 
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helps  to  compensate  for  the  limited  capacity  of  short  term 
rerory,  ana  ccrrp laments  long  term  memory.  All  methods  used 
her  this  purpose  are  generally  referred  to  as  external 


reror ies . 


C.  LONG  TERM  MEMORY 

When  we  learn  or  memorize  something,  the  Information  is 
retained  In  long  term  memory.  When  some  event  causes  the 
recall  of  other  events  In  the  mind,  the  information  comes 
from  long  term  memory.  It  is  the  reservoir  of  permanent 
knowledge  used  in  cognition,  and  has  stored  In  it  everything 
from  the  spatial  model  of  the  world  to  the  motor  and 
perceptual  skills  used  moment  to  moment  [Ref.  9:  pg.  56]. 
Fut  simply,  it  is  the  knowledge  base  we  operate  from. 

Unlike  short  term  memory,  the  capacity  of  long  term 
memory  seems  virtually  unlimited.  It  receives  and  stores 
new  information  after  processing  In  short  term  memory,  and 
this  Information  Is  directly  accessible,  once  stored.  Also, 
research  has  shown  that  the  knowledge  in  long  term  memory  is 
organized,  and  that  the  organization  may  change  almost 
Instantaneously,  based  cn  the  ccnteit  of  the  Information 
being  processed  in  short  term  memory.  As  will  be  seen 
later,  this  ability  Is  significant  In  terms  of  the  model, 
and  will  be  discussed  In  more  detail  as  It  relates  to  an 
expert  programmer's  knowledge  base. 

E .  EXTERNAL  MEMORY 

As  an  aid  to  information  processing,  external  devices 
such  as  pencil  and  paper,  chalkboards,  and  tape  recorders 
are  used  to  store  Information  not  In  long  term  memory  which 
the  programmer  wants  readily  available  for  reference.  This 


E .  SHORT  TiRM  MEMORY 


Information  enters  the  cognitive  system  through  short 
term  memory.  CURTIS  [Ref.  6]  quite  adequately  describes 
this  memory  as : 

"a  limited  capacity  workspace  which  holds  and  processes 
those  items  of  information  currently  under  cur  attention.” 

This  limited  capacity  was  first  quantified  by  MIIER  as  7+2 
items  [Ref.  7].  As  will  be  seen  later,  an  item  is  not 
limited  to  a  single  memory  element,  and  may  be  a  'cbunfc'  of 
Indefinite  size. 

The  information  which  exists  in  short  term  memory  is 
transient  and  must  be  constantly  used  or  'rehearsed'  to 
prevent  its  rapid  decay  [Ref.  8].  If  the  information  is 
gained  via  perception,  this  rehearsal  will,  after  a  time, 
fix  the  information  in  long  term  memory.  This  is  sometimes 
called  the  learning  process.  If,  on  the  other  band,  the 
information  being  used  was  recalled  from  long  term  memory, 
this  rehearsal  serves  to  reinforce  it.  This  reinforcement 
has  a  positive  effect  on  the  future  recall  of  this 
information  and  may  cause  it  to  migrate  due  to  repetitive 
use.  Both  rapidity  of  recall  and  information  migration  are 
discussed  latex  as  they  pertain  to  the  model. 
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The  implication  of  this  analogy  is  that  semantic  nets 
are  organized  hierarchically .  If  this  idea  is  accepted,  it 
fellows  that  in  order  to  recall  a  certain  piece  of 
information,  several  levels  of  the  hierarchical  structure 
must  he  transited  depending  on  the  point  of  entry.  This 
walk  through  several  levels  necessarily  has  an  adverse 
effect  on  the  speed  of  recall.  Yet,  in  some  instances, 
information  which  should  be  separated  by  several  levels  is 
recalled  faster  than  expected,  implying  an  alternative 
method.  To  explain  this,  MINSKY  introduces  a  second  notion 
which  allows  for  shortcuts  through  several  levels.  The 
argument  is  that  if  a  certain  path  is  reinforced  a  number  of 
times  through  use,  a  direct  link  is  formed,  analogous  to 
tailing  back  roads  to  avoid  lights  and  traffic. 

These  properties  of  semantic  nets  reflect  those  of  an 
associative  memcry  and  will  be  referred  to  extensively 
throughout  the  remainder  of  this  paper.  retails  will  be 
added  as  necessary,  to  further  explain  behaviors,  and  this 
should  make  these  semantic  net  properties  clearer.  However, 
it  is  important  for  the  reader  to  understand  these  before 
proceeding . 


One  way  tc  represent  this  in  a  semantic  net  Is  to  view 
an  object  as  a  node  bundle.  This  bundle  consists  of  a 
general  object  ncde  as  well  as  a  number  of  nodes  each 
representing  a  different  perspective  for  that  object.  Link$ 
relevant  to  a  particular  conteit  are  associated  with  the 
corresponding  perspective  ncde. 

With  such  a  representation,  shown  for  CAR  in  Figure  E, 
slot  values  are  accessed  either  with  or  without  a 
perspective.  Say,  for  exarple,  the  size  of  CAR  is  needed. 
If  CAR  is  with  reference  to  a  train  the  returned  value  would 
be  quite  a  bit  different  than  if  the  inquiry  were  made  for  a 
toy  car.  If  nc  perspective  is  given,  the  node  bundle 
collapses  to  the  single  CAR  node  used  throughout  this 
eiample.  This  causes  all  possible  slot  values  to  be 
returned,  each  annotated  with  the  associated  perspective. 

This  notion  of  node  bundles  and  object  classification 
leads  to  the  idea  of  node  clustering.  Put  simply,  a  node 
cluster  is  a  grouping  in  the  net  of  objects  and  links 
strongly  associated  with  one  or  two  specific  oDjects  of  the 
cluster.  MNSKT  uses  a  geographic  analogy  tc  illustrate  the 
idea  LRef.  5:  pg .  118].  He  suggests  picturing  capitol 
cities  with  streets  rowed  by  bouses.  These  cities  are 
connected  via  major  throughfares  to  smaller  suburban  cities, 
which  are  in  turn  connected  to  towns,  etc.  The  analogy  to 
clusters,  objects,  and  links  is  readily  apparent. 


Another  quality  cf  an  associative  memory  is  the  ability 
to  distinguish  the  correct  usage  of  an  object*  through 
conteit  or  perspective,  when  many  different  meanings  exist. 
This  dependency  on  context  must  also  be  represented  in  the 
Det.  Work  cited  ty  COHEN  supports  the  idea  that  objects 
each  have  many  classifications,  determined  by  context  [Ref. 

pp.  S-ie]  .  This  is  because  certain  objects,  when  viewed 
from  different  perspectives,  take  on  new  or  different 
qualities  and  attributes.  A  car,  for  example,  can  be  looked 
at  as  an  automobile,  or  as  a  tcy,  or  as  the  car  of  a  train. 
Obviously,  each  will  have  different  attributes  which  are 
identified  through  context.  The  result  is  one  object  with 
three  distinct  purposes  or  aspects. 
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Ilgure  5  -  A  Perspective  Node  Bundle 


whose  abstraction  is  represented.  These  are  truly  default 
values,  whose  purpose  is  to  fill  a  void  until  more  specific 
information  is  obtained.  This  information  is  not  an 
exception  to  the  frame,  but  an  expected  piece  of  data 
previously  missing  or  unknown. 


ilgure  4  -  Semantic  Net  with  Exception 


specific  color,  CAR28  Is  Incomplete.  To  remedy  this.  It 
inherits  the  default  value  RID,  until  such  tire  as  its  own 
color  is  added  to  the  knowledge  base. 

Exceptions  and  unusual  circumstances  must  also  be 
accounted  for.  Using  the  CAE  example  again,  suppose  CAR28 
is  an  experimental  model  using  compressed  air  for  power. 
The  PROPULSION  slot  of  the  CAR  frame  is  filled  with  the 
value  ENGINE,  yet  for  CAR28 ,  this  would  be  incorrect.  Prior 
to  knowing  the  method  of  FRCPULSION,  it  is  'assumed'  that 
CAR26  is  powered  by  an  engine.  Once  the  method  is  known, 
however,  a  PROPULSION  link  is  added  to  CAB28 ,  reflecting  the 
exception.  Now,  in  trying  to  fill  the  FRCPULSION  slot  for 
CAR28 ,  the  first  value  arrived  at  is  COMPRESSED-AIR,  the 
search  stops,  and  the  frame  slot  value  becomes 
inconsequential.  Figure  4  is  the  representative  net. 

By  this  explanation,  it  may  appear  that  all  objects 
making  up  a  frame  are  default  values,  and  exceptions  nothing 
more  than  specific  slot  values  In  lieu  cf  the  default. 
Each,  however,  is  subtly  different.  A  frame  is  made  up  of 
attributes  of  an  object.  Some,  such  as  engine,  tire,  or 
seat,  are  common  to  the  majority  and  as  such  are  not 
substitute  values,  used  for  lack  of  one  more  specific,  but 
the  same  value  shared  among  many  objects.  An  exception  is 
where  particulars  cf  an  object  contradict  any  of  these 
shared  values.  Others,  such  as  color,  are  common  attributes 
with  possibly  different  values  for  each  instance  of  the  item 
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specific  objects-  which  are  Instances  of  the  abstraction. 
These  properties  or  slot  values  are  then  inherited  by  the 
more  specific  instances,  raking  the  net  less  complicated. 

Slots  can  te  added  to  or,  although  less-  likely, 
subtracted  frcir  a  frame.  This  would  occur  due  to  additional 
information  being  incorporated  into  the  net.  Because  of  the 
dynamics  of  frames,  they  always  represent  the  most  current 
abstraction  relative  to  the  entire  semantic  net. 


figure  3  -  A  frame 

A  frame  also  serves  to  provide  DEFAUIT  values  for 
incomplete  pictures.  Let's  say,  for  illustrative  purposes, 


that  one  of  the 

slots  of 

the  frame  representing 

CAR  is 

the 

CCICR  slot,  and 

it  is 

filled 

with  the 

value 

RED  . 

Now 

further  suppose 

another 

object 

CAR2E  is 

introduced , 

hut 

without  a  COLCP 

link. 

Since 

all  cars 

must 

have 

some 

When  CAR27  is  thought  of,  many  facts  about  it  coire  to 
mind.  It  has  ao  engine,  tires,  and  seats.  Also,  it  is  a 
vehicle  used  for  transportation,  roes  this  mean  that,  using 
cur  representation,  the  object  CAR27  should  have  direct 
links  to  the  objects  ENGINE,  TIRE,  SEAT,  VEHICIE,  and 
TRANSPORTATION?  The  answer  is  no.  The  way  this  information 
is  represented  is  through  a  property  called  inheritance,  and 
the  use  oi  frames. 

Inheritance  is  an  object's  acquisition  cf  a  slot  value 
by  inheriting  the  value  from  another  object  through 
association.  figure  2  is  a  semantic  net  shoving  one 
representation  of  the  above  facts  about  CAR27 .  As  can  be 
seen,  CAR27  has  no  USED-FCR  link,  but  does  have  an  IS-A  link 
to  the  more  abstract  object,  CAR.  However,  it  also  has  no 
USED-FOR  link,  tut  is  associated  to  the  object  VEHICLE 
through  an  AKO  -  A  Eind  Cf  -  link.  In  tracing  the  net  from 
CAR27 ,  VEHICLE  is  the  first  node  reached  which  does  have  a 
USED-FCR  slot  value,  TRANSPORTATION.  CAR27 ,  therefore, 
inherits  this  value  through  its  indirect  association  with 
VEHICLE. 

Again  looking  at  Figure  2,  notice  the  object  CAR  is 
linked  to  some  familiar  characteristics  of  a  car  via  HAS 
links.  This  area  of  the  net,  isolated  in  Figure  3,  is 
called  a  ERAtfE.  A  frame  is  a  set  or  cluster  of  objects 
which  serve  as  slot  values  for  an  abstract  or  less  specific 
object.  Its  purpose  is  to  group  properties  common  to  many 
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filled  with  the  value  BLUE,  the  IS-A  slot  with  the  value 
CAB,  and  the  CWMD-BT  slot  with  the  values  DCUG  and  JILL. 
Note  that  the  objects  do  net  have  to  te  tangible  items,  as 
illustrated  by  the  object  BLUE.  figure  1  Is,  cl  course,  a 
representation  cl'  the  knowledge  that  CAB2?  Is  a  blue  car 
cvDed  by  loug  and  Jill. 


figure  2 


Inheritance  In  Semantic  Nets 


between  objects  are  called  links.  They  are  represented  in 
the  figures  by  labeled  circles  and  arrows  respectively.  A 
third  terir  used  by  WINSTON,  which  is  less  standard,  is  the 


slct.  The  slots  of  a  code  are  the  aifferent  nared  links 
originating  at  the  node.  An  example  rrigbt  serve  here  to 
better  describe  the  us*  cf  these  terns. 

It  Eigure  1,  we  have  an  exatrple  of  a  semantic  net.  The 
five  objects  are  CAB27  which  is  a  specific  car,  CAR  which  is 
a  general  abstraction,  DOUG  and  JILL  which  represent 
specific  people,  and  the  object  BIUE.  There  is  an  OWNED-BY 
link  between  CAE27  and  DCUG,  and  between  CAB27  and  JILL. 
There  is  an  IS-A  link  between  CAB27  and  CAR,  and  there  is  a 
CCICR  link  between  CAK27  and  BLUE.  CAR27  has  four  links 
associated  with  it,  tut  only  three  slots.  The  COLOR  slot  is 


Hgure  1  -  A  slrrple  semantic  net 
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Memory.  This  may  not  be  a  physical  division,  though  some 
researchers  suggest  that  they're  located  in  different  areas 
of  the  brain,  tut  rather  one  of  cognition.  Sorre  researchers 
also  include  a  third  area.  Working  Memory.  As  the  validity 
of  this  additional  division  of  memory  is  not  critical  to  the 
model,  the  simpler  idea  is  adopted.  A  final  form  of 
'memory',  called  External  Memory,  is  also  used. 

A.  SEMANTIC  NETS 

A  semantic  net  is  a  directed  graph  made  up  of  nodes, 
representing  objects,  connected  to  one  another  via  links. 
These  links  indicate  specific  relationships  or  associations 
between  nodes.  This  representation  of  knowledge  is  very 
popular  among  members  of  the  Artificial  Intelligence 
community.  As  there  is  no  definitive  set  of  characteristics 
for  a  semantic  net,  these  relevant  to  the  model  proposed 
here  are  described.  Much  of  this  information  is  taken  from 
a  text  by  WINSTON  [Ref.  3],  whose  description  seems  standard 
when  compared  to  others  in  the  literature.  Properties  have 
been  added  or  altered,  however,  to  aid  in  explaining  certain 
behaviors  of  expert  programmers.  It  is  emphasized  again 
that  the  model  is  based  on  observed  behavior,  and  in  no  way 
depends  on  the  validity  of  this  presentation  of  semantic 
nets,  or  any  other  knowledge  representation. 

Three  terms  ere  used  here  to  describe  semantic  nets. 
The  objects  of  the  net  are  called  nodes  and  the  relations 


The 

depth  of 

knowledge 

in 

these 

categories  allcws 

the  expert 

to 

quickly 

focus 

on 

the 

important  aspects 

of  new 

information . 

Using 

the 

processes  covered  in 

the  next 

chapter,  he 

or  she 

can 

then 

encode  this  information  and 

relate  It  to  what  is  already  In  long  term  memory. 

Experts  are  familiar  with  many  algorithms  which  do 
essentially  the  same  Job.  Associated  with  each  In  the 
knowledge  base  is  a  set  of  benefits,  drawbacks, 
applications,  and,  either  implicitly  or  explicitly,  a 
complexity  evaluation.  Choosing  integer  sorting  as  a 
representative  task,  there  are  several  options:  Merge  Sort, 
Comparison  Sort,  Hadix  Sort,  and  Quick  Sort  to  name  a  few. 
Each  is  useful  in  accomplishing  the  sort,  however,  each  is 
also  especially  suited  to  certain  applications.  Each  also 
has  variations  which  are  applicable  to  other  types  of  sorts. 
The  expert  is  familiar  with  these,  as  well  as  the  underlying 
principles  which  differentiate  them  from  one  another.  This 
allows  him  or  her  to  readily  adapt  these  algorithms  to  meet 
different  needs,  lexicographic  sorting  for  instance. 

like  algorithms,  data  structures  have  many  variations. 
The  expert  is  familiar  with  these  and  with  the  underlying 
principles  behind  their  design  as  well.  This  allows  easy 
modification  to  meet  new  requirements  and  aids  the  expert  in 
recognizing  design  flaws  such  as  lack  of  flexibility  or 
expandability.  The  expert  also  has  knowledge  of  algorithms 
and  can  correlate  a  given  data  structure  with  an  algorithm 
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or  group  of  algorithms  for  a  specific  application.  The 
expert  can  aisc  relate  Information  on  programming  languages 
to  data  structures,  evaluating  the  relative  ease  with  which 
specific  structures  can  be  used  and  manipulated. 

Programming  languages  are  ,  to  some  degree,  familiar  to 
all  programmers,  whatever  their  sicill  level.  An  expert, 
hovever,  is  not  only  versed  In  the  syntax  and  semantics  of 
several  languages.  Be  or  she  is  also  familiar  with  the 
advantages  and  disadvantages  of  one  language  design,  or 
particular  machine  implementation,  over  another.  While  the 
choice  of  language  is  not  an  option  for  the  programmer 
tasked  with  maintaining  or  debugging,  the  particular  design 
and  implementation  features  play  an  important  role  when 
porting  software  from  one  machine  to  another. 

Knowledge  of  language  design  and  implementation  also 
allows  the  expert  to  make  Judgements  about  software 
efficiency  and  memory  needs.  This  knowledge  also  allows  for 
identifying  potential  trouble  spots,  usually  avoiding 
analysis  of  the  entire  program.  This  is  particularly 
important  when  evaluating  poscltie  effects  of  a 
mod  if lcatlon . 

Information  about  algorithms  also  contributes  to  the 
knowledge  of  languages.  As  most  languages  have  built-in 
functions,  the  expert  can  evaluate  the  particular  algorithms 
used  to  implement  these.  This  evaluation  adds  to  his  or  her 
knowledge  base  of  programming  languages,  aids  in  efficiency 


analyses,  and  is  useful  in  predicting  the  accuracy  of 
results.  Supported  by  this  knowledge,  an  expert  may  choose 
to  substitute  ether  routines  using  irore  applicable 
algorithms,  tor  such  things  as  increased  accuracy  in 
calculations,  mere  efficient  device  drivers,  cr  faster 
access  to  secondary  storage.  He  or  she  might  also  choose  to 
replace  programmed  functions  with  ones  tuilt  into  the 
language,  for  the  same  reasons. 

Knowledge  regarding  logic  is  important  in  two  ways. 
First,  it  enables  the  expert  to  learn  the  specific 
implementation  of  control  statements  in  a  programming 
language,  adding  this  to  his  or  her  knowledge  base.  Second, 
it  aids  in  evaluating  the  flow  of  control  in  a  given  piece 
of  software.  Both  help  in  analysing  the  efficiency  of  the 
software.  Taking  the  following  IF-THIN  statement: 

IF  (  A  >  ie  )  OH  (  B  <  IS  )  THEN  C  =  D 
the  expert  would  know,  or  could  test,  whether  or  not  the 
second  comparison  is  executed  independent  of  the  result  of 
the  first.  Taking  advantage  of  this  type  of  information 
could  greatly  impact  the  software's  efficiency,  saving  money 
and  CPU  time. 

Programming  design  methodologies  are  treated  differently 
from  other  categories  in  the  knowledge  base.  They  can  not 
be  defined  in  specific  terms,  as  we  have  done  with  the 
others,  and  are  seen  as  more  of  a  gestalt  type  of  knowledge. 
They  help  the  expert  in  analysing  possible  side  effects, 


which  is,  in  part,  a  function  of  modulari ty .  They  play  a 
major  role  in  processes  to  be  presented  later,  such  as 
CHUNKING ,  SLICING ,  and  HYFCTHESIZING. 

Aside  from  Knowledge  of  programming,  the  expert 
rraintainer  must  also  know  something  of  the  specific 
application  area.  The  level  cr  amount  of  information 
recessary  is  dependent  upon  the  modification  to  he 
implemented.  At  the  very  least,  however,  the  programmer 
needs  to  know  enough  to  he  able  to  interpret  the 
documentation  and  program  specifications  in  order  to  make  a 
Judgement  regarding  potential  side  effects  of  the  change. 
This  information  is  either  learned  information  in  long  term 
memory,  which  can  be  recalled  for  future  tasks,  transient 
information  used  and  then  forgotten,  or  information  kept  as 
reference  using  an  external  memory. 

The  view  of  this  study  is  that  what  is  contained  in  the 
knowledge  base  directly  affects  the  programmer's  ability  to 
understand  a  given  piece  of  software.  Obviously,  what  the 
programmer  knows  at  the  cutset  abcut  the  program's  task 
domain,  and  information  related  to  it,  will  impact  on  his  or 
her  difficulty  in  gaining  this  understanding.  Extending 
this  idea,  a  large  disparity  in  the  knowledge  level 
significantly  affects  the  level  of  competence  of  the 
programmer  and,  consequently,  the  relative  quality  of  the 
work . 
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The  cognitive  processes  which  Interact  with  tbls 
knowledge  base,  In  order  for  the  programmer  tc  achieve  this 
understanding,  perform  essentially  three  functions.  Factual 
Information  Is  analysed  and  added  tc  the  knowledge  base,  or 
concepts  and  methodologies  are  abstracted  from 
documentation,  or  information  from  cne  category  is 
associated  with  that  from  another  (such  as  correlating  a 
data  structure  with  an  algorithm).  These  functions  serve  to 
integrate  all  information  available  to  the  programmer 
applicable  to  the  task. 

This  Knowledge  base  Is  not  simply  a  collection  of  facts. 
It  is  the  organized  accumulation  of  information  into  a 
network  reflecting  semantic  associations.  This  organization 
is  equally  as  important  as  the  information  itself. 

B.  ORGANIZATION 

Studies  of  recall  show  that  people  tend  to  organize 
Information  into  categories  and  groupings.  Most  items  or 
objects  in  memory  are  members  of  more  than  one  of  these 
categories,  dependent  on  context.  A  piano  is  a  member  of 
the  musical  instrument  category,  and  can  be  sub-categorized 
as  a  keyboard  instrument  in  the  context  of  musical 
Instruments.  It  Is  also  a  member  of  the  category  which 
Includes  butcfc  and  dresser  when  viewed  as  a  heavy  piece  of 


Grouping  ty  order  is  another  observed  way  memory  has 
been  organized.  A  person  asked  to  list  the  ingredients  of  a 
recipe,  for  exarrpie,  will  more  than  likely  list  therr  in 
order  of  their  use.  When  asked  to  list  iterrs  necessary  to 
equip  a  home,  housewives  listed  these  iterrs  either  by 
category — kitchen  utensils,  furniture,  window  coverings — or 
by  considering  necessary  iterrs  room  by  room  [Hef  .  4:  pp.  fi¬ 
ll]. 

The  evidence  provided  by  these  studies  support  the 
hypothesis  that  memory  is  organized  dynamically,  based  on 
the  context  ci  the  stimulus.  It  also  Implies  that  this 
organization  makes  use  of  information  clustering.  What  is 
meant  here  is  that  information  elements  related  ty  context 
'migrate'  toward  certain  key  elements  or  toward  one  another. 
In  either  case,  this  clustering  strengthens  associations  in 
context  between  these  information  elements,  enhancing 
recall.  As  explained  In  a  later  chapter,  this  enhancement 
aids  cognition  ty  raking  pertinent  information  readily 
available  to  short  term  memory,  while  'blocking'  irrelevant 
associations  involving  these  same  elements. 

Eecause  these  groupings  are  determined  by  context,  the 
amount  of  information  contained  in  the  knowledge  base 
associated  with  each  element  has  a  bearing  on  their 
categorization.  The  greater  the  amount  of  associated 
knowledge,  the  mere  refined  the  groupings  can  be.  As  more 
knowledge  Is  gained  and  this  refinement  continues,  new 
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clusters  are  forired  to  replace  those  less  refined, 
association  between  any  two  becomes  more  specific.  This,  in 
turn,  results  in  a  reorganization  of  memory. 

The  studies  cited  here  involve  sirrple  elerrent  lists. 
However,  this  idea  is  easily  extended  to  rrore  corrplex 
information  elerrents,  such  as  concepts,  ideas,  and 
abstractions,  which  are  themselves  clusters  of  inf oriration . 
The  implication  throughout  this  chapter  is  that  different 
knowledge  categories  or  domains  are  used  best  when 
integrated.  Eow  the  contents  and  organization  of  memory 
relates  specifically  to  the  expert,  and  how  this  integration 
is  accomplished,  is  addressed  in  the  following  chapter. 
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IV.  THE  PROCESSES 


S CBN E HERMAN  ana  MAYER  conjecture  that,  to  facilitate 
program  comprehension: 

"the  programmer ,  with  the  aia  of  bis  or  her  syntactic 
Knowledge  of  the  language,  constructs  a  multileveled 
internal  semantic  structure  to  represent  the  program.” 
IRef.  11] 

The  present  study  has  identified,  in  the  context  of  software 
maintenance,  three  major  complementary  cognitive  processes, 
supported  by  certain  lesser  ones,  used  to  accomplish  this. 
Further,  it  is  the  tenet  of  the  study  that  the  entire 
program  need  net  be  represented  in  memory,  but  only  that 
part  which  is  of  interest  as  determined  by  the  programmer. 

The  descriptions  of  these  processes  have  been  formulated 
from  observed  programmer  behavior.  The  ideas  presented  are 
extensions  cf  theories  based  on  empirical  data  resulting 
from  limited  testing.  Introduction  and  subsequent  treatment 
cf  these  ideas  in  the  literature  has  been,  in  many  cases, 
artfully  vague,  with  researchers  characteristically  relying 
on  intuitive  understanding  through  example.  Therefore, 
although  an  attempt  is  made  here  to  more  clearly  define 
these  processes,  the  next  chapter  presents  a  scenario 
exemplifying  the  application  of  each. 
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A.  CHUNKING 

The  cognitive  process  Known  as  'chunking'  is  a  learned 
skill,  enabling  a  programmer  to  encode  information  in  such  a 
way  that  a  group  of  information  elements  can  te  represented 
and  processed  as  a  single  element  in  short  term  memory 
[Hef.  7] .  As  mentioned  previously,  short  term  memory  is 
where  information  processing  occurs,  and  is  characterized  as 
having  a  limited  capacity.  This  grouping  or  organizing  of 
information  allows  programmers  to  operate  cn  'chunks'  of 
associated  information  rather  than  single  items.  This 
translates  to  giving  the  programmer  a  broader  perspective  of 
the  task. 

Chunking  is  a  very  dynamic  process,  in  terms  of  the 
knowledge  base.  A  chunk  is  created  when  an  association  is 
formed  between  an  encoded  item  in  short  term  memory  and  its 
corresponding  information  cluster  in  long  term  memory.  This 
cluster  is  the  result  of  a  reorganization  of  memory  based  on 
the  context  cf  the  stimulus  which  initiated  the  chunking 
process.  It  can  te  added  to  or  deleted  from,  based  on  the 
results  of  partial  completion  of  the  task  for  which  it  was 
created,  or  as  information  is  learned,  regarding  the  task, 
through  other  processes. 

Chunking  associations  may  also  be  formed  between  the 
encoded  Item  and  information  in  external  memories.  These 
associations  may  access  information  directly,  or  might 
simply  guide  the  programmer  to  a  reference  in  which  the 
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necessary  information  is  contained.  In  either  case,  they 
allow  the  programmer  the  use  of  transient  or  task  specific 
information.  At  the  same  time,  they  alleviate  the 
programmer  of  the  burden  of  having  to  learn  the  information 
so  it  might  be  added  to  the  cluster,  or  of  having  to  store 
it  in  short  term  memory  before  it  is  needed. 

The  amount  of  information  represented  by  a  chunk  is 
arbitrary  lRet.  12].  Its  size  is  dependent  on  how  much 
associated  information  is  contained  in  the  krowledge  base, 
and  to  what  extent  external  memories  are  used.  The  results 
of  research  by  MIILER  and  others  indicate  that  the  number  of 
items  used  or  stored  in  short  term  memory  is  relatively 
constant.  From  this  it  can  be  concluded  that  the  number  of 
chunks  which  can  be  processed  is  independent  of  chunk  size 
IRef.  13:  pg.  177,  Ref.  9:  pg.  44].  Thus,  chunking 
effectively  increases  the  capacity  ol  short  term  memory  as 
relates  to  information  processing. 

Besides  having  the  ability  to  handle  more  information  in 
short  term  memory,  chunking  also  allows  the  programmer  quick 
access  to  specific  information  which  is  part  of  the  chunk. 
The  reason  is  that  chunks,  representing  information 
clusters,  enhance  recall  of  that  information.  All  knowledge 
associated  witt  the  chunk  has  effectively  been  accessed,  ano 
can  be  thought  cf  as  staged  for  recall.  This  can  best  be 
explained  by  using  a  semantic  net  representation. 
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When  the  chunk  is  created,  a  reorganization  of  the 


Knowledge  base  takes  place,  and  information  migrates, 
forming  a  high  density  rode  cluster.  Again,  the  size  of 
this  cluster  depends  on  the  extent  of  the  Knowledge  base. 
This  density  decreases  the  length  of  nodal  links,  resulting 
in  a  shorter  walk  from  the  initial  access  node  or  capital  of 
the  cluster  to  the  desired  information  element.  The 
association  between  the  encoded  item  and  the  knowledge  base 
is  one  example  of  the  'shortcut'  described  earlier,  and 
licks  short  term  memory  to  the  capital  of  the  cluster. 

The  perspective  has  also  been  identified  and 
associations  tetween  codes  not  in  context  have  been 
deemphaslzea.  Ail  the  information  represented  by  the  chunk 
is  new  Just  beyond  the  programmer's  consciousness  waiting  to 
be  recalled.  The  encoded  item  can  therefore  be  processed, 
representing  a  group  of  knowledge,  with  specific  items 
associated  with  the  chunk  rapidly  recalled  for  use  when 
recessary . 

Some  researchers,  such  as  KINTSCE,  suggest  that  chunks, 
once  formed,  can  be  permanently  stored  in  long  term  memory 
[Ref.  13:  pg.  175].  This  idea  is  inconsi  tent  with  the 
presentation  here,  and  research  for  this  study  has  uncovered 
no  data  to  support  the  hypothesis.  KINTSCH  himself 
differentiates  tetween  what  a  chunk  Is  in  short  and  long 
term  remory.  His  idea  of  stored  chunks  closely  corresponds 
to  the  earlier  presentation  of  information  clustering.  As 
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It  is  tbe  contention  of  this  study  tbat  a  chunk  exists  only 
sc  long  as  it  Is  under  tbe  programmer 's  attention,  this 
notion  of  permanently  stored  chunks  is  disregarded. 

B.  SLICING 

Expert  programmers  break  large  unfamiliar  programs  into 
smaller  coherent  pieces  in  order  to  gain  an  understanding  of 
their  function  and/or  design.  Often,  these  pieces  are 
determined  by  the  original  writers  of  the  code.  They  are 
identified  as  blocks  of  code  in  the  form  of  subroutines, 
procedures,  functions,  and  the  like.  Identification  is 
usually  explicit  and  the  pieces  are  written  into  the  source 
as  contiguous  lines  of  program  text.  One  can  think  of  these 
as  functional  pieces  of  the  program. 

Also,  experts  routinely  partition  progians  in  ways  that 
dc  net  conform  to  textual,  modular,  or  functional  structure, 
permitting  multiple  views  of  tbe  same  code.  Unlike 
furcticnai  pieces,  which  have  a  one-to-one  correspondence 
between  function  and  purpose  of  coce  lines,  this  type  of 
division  allows  lines  of  code  to  be  viewed  from  different 
perspectives.  This  associates  a  single  lire  of  code  with 
mere  than  one  purpose.  The  construction  of  these  views  is 
what  WIISER,  who  first  proposed  the  idea,  cells  'Program 
Slicing'.  The  process  is  used  to  strip  from  a  program 
statements  which  do  not  influence  a  specific  behavior  or 
slicing  criterion.  The  result  is  an  abstract  representation 


cf  the  program  as  viewed  from  the  perspective  of  the 
specific  tehavicr.  This  group  of  statements,  usually 
associated  with  a  single  variable,  is  called  a  program 
slice  [Pef.  14:  pg.  43y ,  Ref.  It:  pg.  446]. 

Slicing  is  important  in  maintenance  because  typically 
cnly  a  subset  cf  the  program's  behaviors  is  being  improved 
cr  replaced.  By  eliminating  non-infiuential  code,  the 
ma intainer 's  jet  is  made  simpler.  He  or  she  can  then  deal 
witt  a  much  smaller  'program'.  While  this  program  may  not 
ce  syntactically  correct,  it  is  semantically  correct  for  the 
behavior  of  interest. 

Also,  the  entire  piece  of  software  need  not  be  sliced. 
If  a  point  in  the  flow  of  control  can  be  identified  which 
bounds  the  slicing  criterion,  then  only  that  part  of  the 
code  still  to  be  executed  need  be  sliced.  This  further 
reduces  the  programmer's  task. 

Two  key  areas  of  the  knowledge  base  are  especially 
Influential  In  determining  the  effectiveness  of  a 
i r egrammer 's  slicing  ability.  Programming  logic  allows  the 
mair.teirer  to  easily  identify  bounds  of  a  specific  behavior. 
He  cr  she  can,  with  an  extensive  knowledge  base,  trace 
through  the  program's  flew  of  control  easily  and  accurately, 
recognizing  particular  logic  features  of  the  language. 
Pi  sc,  the  expert's  in-aepth  knowledge  of  the  programming 
language  gives  him  cr  he*  the  ability  to  readily  identify 
lines  of  code  which  impact  the  slicing  criterion.  Bor 
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example,  familiarity  with  hew  data  is  passed  and  whether  or 
net  it  is  altered  cy  coae  or  sirrply  used  and  returned 
without  change  (ie.  Pass  by  Reference,  Value,  or  Narre)  could 
greatly  affect  the  size  c f  the  slice. 

The  extent  to  which  experts  enploy  slicing  seems  to 
aepend  on  the  pregram.  Testing  hy  WEISER  shows  that  factors 
influencing  the  use  cf  slicing  are  cede  size,  structure,  and 
ease  of  understanding  [Ref.  l£:  pp.  459-461].  This  suggests 
that  slicing  is  found  cy  experts  to  be  most  effective  on 
poorly  structured  programs,  and  less  sc  on  those  which  are 
well  designed  and  maKe  use  of  modules,  comments,  and 
mnemonics.  Effectiveness  here  is  a  relative  measure  of  the 
amount  of  worn  eliminated  and/or  information  gained  by 
slicing. 

The  weric  by  WEISER  also  demonstrates  that  expert 
programmers  Independently  develop  their  own  style  of 
slicing.  This  does  net  preclude  teaching  its  principles  to 
less  able  programmers,  but  points  out  the  process" 
dependence  cn  the  Knowledge  and  experience  cf  the 
individual.  It  also  points  to  the  fact  that  it  Is  a 
subjective  process  and  cannot  presently  be  implemented 
fully.  Eor  the  interested  reader,  however,  WEISER  does 
describe  algorithms  for  approximating  slices  and  discusses 
the  effectiveness  cf  two  automatic  slicing  tccis  [Ref.  14]. 
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that  if  a  number  is  compared  to  anything  tut  another  number, 
a  'type  mismatch'  occurs.  Therefore,  STUD_GRADE [I]  rrust  be 
a  turrber.  This  verifies  the  first  slot  of  the  frame. 

The  REPEAT-UNTII  block  of  the  original  slice  is 
recognized  as  a  looping  construct.  This,  coupled  wi  the 
fact  that  one  variable  inside  the  loop  is  used  as  an  index, 
allows  the  programmer  to  chunk  the  block  as  "EUIIE  AN 
ARRAY".  This  chunk  is  associated  with  the  grade  input  and, 
based  on  this  conteit,  the  information  cluster  associated 
with  the  grade  data  structure  is  processed.  It  is  found  to 
include  the  class  of  array  data  structures,  and  so  the 
second  slot  and  its  corresponding  hypothesis  is  also 
verified.  With  all  code  new  mapped,  the  entire  input 
representation  is  considered  verified,  as  all  higher  level 
hypotheses  inherit  the  verification.  Also,  with  reference 
to  the  last  verification,  it  should  be  noted  that  the 
information  cluster  and  hypothesis  were  further  refined  to 
reflect  that  a  particular  class,  the  array  class,  of  list 
structures  was  used. 

If  a  contradiction  does  occur  in  verification,  a  walk  up 
the  subtree  takes  place.  Each  hypothesis  is  checked  until 
one  is  found  which  the  information  does  not  contradict.  A 
new  hypothesis  is  formed  at  the  next  lower  level  as  a 
refinement  of  this  hypothesis,  and  all  hypotheses  below  this 
level  are  reevaluated  based  on  the  new  context.  A  similar 
process  takes  place  if  information,  other  than  that 


Assume  the  following  is  t U e  result  of  the  slicing 
process : 

READ  STUIENT 

REPEAT 

1  =  1  +  1 

REAL  STUD  GRACE  II] 

UNTIL  STUD_GRADE  [I J  =  999 

The  programmer  now  attempts  to  verify  the  hypotheses  against 
the  code.  The  READ  STUDENT  line  stands  alone  as 
verification  of  the  hypothesis  that  each  student  is  input. 
To  verify  the  two  hypotheses  associated  with  grades  is 
slightly  rrcre  corrpiicated .  The  READ  STUI_GRAIE  [I]  statement 
would  be  adequate  to  verify  the  hypothesis  that  student 
grades  were  input.  However,  it  falls  to  confirm  that  it  is 
a  numerical  representation.  To  confirm  this,  if  no 
declaration  statement  exists,  the  programmer  must  analyse 
the  behavior  of  the  variable.  The  code  resulting  from  the 
slicing  process  based  on  input  is  itself  sliced,  this  time 
cn  STUD_GRADE  [I]  .  The  UNTIL  STUD.GRAEI  [ I]  =  999  statement 
becomes  tfce  only  other  line  in  the  slice. 

The  programmer  recognizes  the  UNTIL  statement  as  a 
compare  and  branch  operation  and  notes  that  the  variable  is 
compared  to  a  number.  His  or  ter  knowledge  of  the 
programming  language  is  extensive  enough  to  realize  that  999 
must  be  a  number  and  net  a  string.  Also,  he  or  she  knows 


of  the  hypotheses  formed.  However*  this  understanding  is 
not  appreciably  diminished  unless  the  erroneous  hypothesis 
is  located  in  a  top  level  of  the  hierarchy. 

Continuing  to  focus  on  Input,  in  order  to  verify  this 
representation  the  programrer  needs  to  slice  the  source  code 
using  input  behavior  as  the  criterion.  Then,  each  line  of 
code  in  the  slice  must  be  mapped  to  a. leaf-frame  or  slot  of 
the  input  subtree.  Note  that  these  leaf-frames  or  slots  do 
not  all  have  to  be  on  the  same  level. 


ilgure  €  -  memory  Representation  of  Program  (Input) 


information  in  the  program,  otherwise  it  is  arbitrary,  Eor 
example,  the  ordering  cf  algorithrrs  would  te  irportant  in 
understanding  the  program,  whereas  the  ordering  of  data 
classes  in  the  frames  created  frorr  the  input  hypotheses,  for 
exatrple  the  one  representing  the  hypothesis  that  both  grades 
ard  student  identification  are  input,  is  net  important  for 
prograir  understanding.  If  subsequent  analysis  reveals  that 
a  specific  ordering  is  necessary,  the  fra  ire  would  be 
reorganized  to  reflect  this,  because  of  the  new  context. 

The  value  of  each  slot  is  an  infcrrTaticn  cluster 
representing  a  knowledge  dorrain,  as  fraires  representing 
hypotheses  use  classes  of  inferrratien  and  not  specific 
elements.  The  cluster  is  formed  based  on  the  context 
defined  by  the  hypothesis  which  the  frame  or  slot 
represents.  Ibe  initial  hypothesis'  INPUT  slot  has,  as  a 
value,  a  cluster  representing  all  data  types  or  classes  that 
the  programmer  associates  with  grades.  When  the  subsequent 
hypotheses  are  formed,  defining  the  input  as  STUDENT  IDENT 
and  GRADE,  this  cluster  is  reorganized  into  a  two  slot 
frame,  each  representing  a  sub-cluster  of  the  original.  The 
value  of  the  STUDENT  IDENT  slot  becomes  all  possible 
representations  by  which  students  can  be  identified,  and  the 
value  of  the  GRADE  slot  becomes  the  cluster  of  all  possible 
classes  cf  grade  representation  contained  in  the  knowledge 
base.  Any  elements  or  nodes  of  the  original  cluster  not 
associated  with  either  cf  these  new  clusters  is  not 


visible  froir  this  frame  down,  sirrilar  to  the  idea  of 

scoping  in  sore  programing  languages.  So  on  one  level, 
there  is  a  single  cluster  representing  the  hypothesis  as  a 
grouping  ol  all  possible  input  data  classes,  while  on 
another  level,  this  same  information,  or  a  subset  of  it,  is 
viewed  as  two  separate  clusters.  This  reorganization  of 
information  occurs  because  of  the  change  in  context  when  the 
subsidiary  hypotheses  are  introduced. 

The  programmer  has  now  increased  his  or  her 

understanding  of  the  program.  In  addition  to  what  was 
expected  based  on  the  original  hypothesis,  the  programmer 
new  also  expects  that: 

-  grades  are  numerical 

-  each  student's  set  of  grades  is  processed  separately 

-  the  grades  are  initially  input  into  a  list  structure 

-  the  grades  are  summed  and  averaged 

-  each  student  is  identified  with  his  or  her  grades 

-  a  mapping  takes  place  from  average  to  letter 

-  student  IE  and  corresponding  letter  grade  is  stored 
ilgure  6  shows  this  representation  focusing  on  the  input 
subtree  of  the  hypothesis  hierarchy.  lech  level  can  be 
thought  of  as  a  level  of  understanding.  It  should  be  noted 
that,  at  this  point,  no  verification  has  taken  place  and 
this  level  of  understand ing  is  contingent  on  the  correctness 


domain,  an  output  domain,  and  a  doiraln  of  algorithms  on 
which  the  processing  cl'  the  data  Is  assured  based.  While 
this  is  certainly  not  specific  enough  a  representation  of 
the  software  to  enaole  the  programmer  to  do  any  useful  worK, 
a  level  of  understanding  has  been  achieved. 

Further  reading  of  the  documentation  reveals  that  each 
student's  grades  will  be  read  in,  summed,  and  the  average 
converted  to  a  letter  grade  and  stored.  This  information 
suggests  many,  more  specific,  data  and  algorithmic  classes, 
and  several  levels  of  hypotheses  are  formulated.  Presuming 
that,  at  this  point,  the  programmer  begins  to  develop 
hypotheses  in  a  quasi  depth-first  order,  focusing  on  input, 
one  hypothesis  would  be  that  grades  are  read  in  as  numbers. 
Another  might  be  that  each  student's  identification  is  input 
in  conjunction  with  his  or  her  grades.  The  grade  data 
hypothesis  is  then  refined,  forming  a  lower  level  hypothesis 
that  grades  will  be  represented  as  integers  and  handled  as  a 
list.  Note  that  at  this  point,  the  programmer  is  not 
interested  in  what  representation  is  used  for  student 
identification,  possibly  because  hypotheses  about  the 
processing  of  the  data  suggest  that  the  identification  data 
will  be  used  but  not  altered,  so  specific  typing  will  not  be 
necessary . 

In  memory,  each  hypothesis  is  represented  as  a  frame 
with  ordered  slots.  This  ordering,  if  relevant,  is  based  on 
the  expected  or  confirmed  ordering  of  the  representative 
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A.  A  WAIX -1HBCUGH 

|  Suppose  a  programmer  is  given  a  program  that  be  or  she 

has  never  seen  before  ana  asked  to  perform  sere  modification 

to  it.  Eurther  suppose  that  tc  dc  this  n od if lcation ,  an 

overall  understanding  of  the  program  is  necessary.  He  or 
she  most  likely  cegins  by  locking  at  the  documentation. 
After  reading  a  sirall  part  of  the  documentation ,  perhaps 
j  a  phrase  or  sentence,  the  programmer  forms  a  hypothesis.  He 

or  she  has  assertained  that  the  program  averages  student 

grades.  This  defines  a  context,  and  a  reorganization  of 
.  memory  takes  place.  This  reorganization  results  ir  a  large 

information  cluster,  forming  a  frame.  It  contains  slots 
such  as  INPUT  IATA,  OUTPUT  DATA,  and  FBCCESSES. 

|  The  value  of  the  INPUT  DATA  slot,  based  on  the 

programmer's  knowledge  of  how  school  grades  are  arrived  at, 
is  a  cluster  of  possible  types  cr  classes  of  data.  These 
j  would  include,  at  this  level,  every  type  of  data  in  his  or 

her  knowldege  tase  that  the  programmer  associates  with 
school  grades,  as  well  as  ail  possible  data  structures 
associated  with  them.  The  values  ot  the  ether  slots  would 

! 

be  of  a  similar  nature. 

So  by  simply  reading  a  single  phrase,  'computes  student 

grade  averages',  the  programmer  has  constructed  an  Internal 
I 

representation  of  the  program.  He  or  she  expects  that  it 
takes  some  input  data,  processes  this  data,  and  outputs  the 
^  result.  In  addition,  he  or  she  has  identified  an  input 
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V.  SCENARIO 


A  scenario  is  now  presented  to  help  exemplify  bow  each 
process  applies  to  the  task  of  program  comprehension.  It  is 
meant  to  give  the  reader  an  intuitive  understanding  of 
application  and  effects,  as  well  as  the  mechanisms 
underlying  these  cognitive  processes.  The  reader  should 
also  gain  an  understanding  of  the  interrelationships  between 
the  processes,  the  knowledge  base,  and  information  relating 
specifically  to  the  program.  It  is  the  collective  use  of 
these  which  gives  the  expert  his  or  her  superior  skills, 
fcr  simplicity,  a  structured  program  is  assumed  as  well  as 
an  ALGCI- like  programming  language.  Agair,  semantic  nets 
are  used  to  represent  memory  organization. 

The  program  used  for  this  scenario  will  be  one  which 
computes  averages  cf  student  grades  and  outputs  a  letter 
grade  for  each.  It  is  a  fairly  structured  program  with 
adequate  documentation  and  uses  mnemonics  but  nc  comments  in 


the  source  code. 


For  verification,  the  hypotheses  forming  the  leaves  of 
the  tree  are  tested  against  the  code.  Two  conditions  are 
necessary  for  verification  cf  the  hierarchy.  First,  cede 
corresponding  to  the  hypothesis  being  verified  must  he  in 
the  program.  Second,  all  cede  must  be  accounted  for  by  one 
of  the  hypotheses.  If  either  of  these  conditions  fails,  the 
structure  is  reorganized  to  reflect  this  and  any  new 
information  gained  from  it. 


Each  introduction  of  new  information  causes  a 


reorganization  of  memory  due  to  the  change  in  context.  This 
reorganization  would  rake  use  of  confirmed  information,  old 
or  new,  and  ray  cause  a  change  in  default  cr  nornal  values 
Dot  yet  verified.  If  this  change  in  context  occurs  at  a  low 
level  of  the  hierarchy,  the  programmer's  perspective  will 
change  only  slightly.  If,  however,  the  change  affects  slot 
values  in  the  top  levels,  reorganization  of  a  large  subtree 
right  occur,  giving  the  prcgrarrer  a  significantly  different 
view  of  the  problem.  The  view  could  also  change  if  the 
prcgrarrer  chooses  to  shift  his  or  her  attention  fror  the 
overall  view,  to  a  mere  refined  hypothesis,  focusing  then  cn 
a  subtree  of  the  hierarchy.  This  would  have  the  effect  of 
emphasizing  the  details  contained  in  this  subtree  and 
'chunking'  the  remainder.  The  hypothesis  hierarchy  is 
therefore  dynamic,  changing  with  every  shift  in  ccntext. 

Verification  can  take  place  at  any  tire.  It  usually 
occurs  when  the  prcgrarrer  reaches  a  level  of  understanding 
about  the  behavior  of  the  program  that  he  cr  she  wishes  to 
confirm.  This  can  be  because  the  programrer  has  reached  a 
level  of  understanding  believed  adequate  for  the  task  he  or 
she  needs  to  perforr,  cr  it  right  sirply  be  to  validate 
certain  hypotheses  before  continuing.  One  reason  for 
intermediate  validation  is  that  it  lessens  the  effects  of 
discovering  an  invalid  hypothesis  or  contradiction. 
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organization  would  greatly  reduce  the  amount  of  searching 
necessary  to  identify  this  class  of  situations. 

The  benefits  of  these  analogies,  when  they  exist,  are 
taken  advantage  of  in  generating  hypotheses.  As  stated 
earlier,  the  programmer  makes  maximum  use  of  his  or  her 
Knowledge  base.  This  is  accomplished  by  relying  on 
previously  learned  information  regarding  a  general  solution 
already  familiar  to  him  or  her.  In  this  case,  the  specifics 
of  the  software  solution  need  only  be  learned  if  and  when 
they  are  needed  and  differ  from  these  of  the  general  one. 
This  is  a  much  reduced  task,  relative  to  learning  the  entire 
solution  (or  program)  when  no  such  analogies  exist  in  the 
knowledge  base. 

Returning  to  the  discussion  of  hypotheses,  the 
hierarchical  structure  can  he  explained  easily  by  once  again 
using  a  semantic  net  representation.  Each  hypothesis  can  be 
thought  of  as  a  frame.  Each  slot  value  of  a  frame  would 
either  he  an  information  element  or  a  frame  itself, 
obviously  more  specific  than  the  one  whose  slot  it  fills. 

Initially,  all  frames  (hypotheses)  would  contain  either 
default  or  normal  values.  As  more  information  is  processed 
regarding  the  software,  these  values  would  te  confirmed  or 
replaced.  These  new  values  could  te  frames,  representing 
still  mere  specific  hypotheses.  Normal  values,  when 
contradicted,  are  replaced  by  exceptions  specific  to  the 
problem  at  hand. 
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possible,  the  supposed  information.  Only  when  a 
contradiction  occurs  is  this  information  replaced. 
Obviously,  this  process  is  dependent  on  the  programmer 's 
having  seen  similar  problems  before.  It  seems  appropriate, 
therefore,  to  digress  for  a  moment  to  address  this  idea  of 
sameness  or  analogy. 

As  was  mentioned  before,  information  in  memory  is 
organized  into  groups  based  on  certain  parameters  or 
constraints.  hew,  in  fact,  this  grouping  is  accomplished, 
is  still  not  understood,  however  it  does  occur.  As 
associations  are  virtually  limitless,  it  seems  logical  to 
assume  that  groupings  are  as  well.  Similar  problems  could 
therefore  be  grouped  and  an  abstract  set  of  circumstances 
formed  to  encompass  dominant  characteristics  of  the  group. 
This  idea  is  similar  to  that  of  a  frame.  Then,  as  problems 
are  introduced,  they  are  compered  against  these  dominant 
characteristics.  If  the  characteristics  match,  the  problem 
is  considered  analogous. 

As  this  matching  process  seems  a  mammoth  task  as 
presented,  consider  the  reduction  of  work  if  these  sets  of 
circumstances  were  grouped  by  single  characteristics, 
incorporating  confidence  levels,  or  another  method  of 
rating,  to  distinguish  most  from  least  dominant  in  the  set. 
This  would  cause  stronger  and  weaker  associations,  leading 
to  the  most  probable  set  first,  analogous  to  an  electron 
following  the  path  of  least  resistance.  This  type  of 


is  that,  at  any  level  cf  understanding,  the  programmer 
should  be  able  to  produce  a  functionally  equivalent  program 
in  any  language  that  he  or  she  is  familiar  with. 

The  title  cf  the  program*  or  a  succinct  presentation  of 
the  task  tor  which  the  software  was  written,  usually 
suggests  enough  information  for  the  programmer  to  generate  a 
hypothesis  about  the  general  flow  of  the  program.  This 
hypothesis  would  incorporate  expected  input  and  output  types 
with  a  corresponding  class  or  group  of  possible  data 
structures.  It  would  also  have  classes  of  algorithms  and 
abstract  logical  constructs  in  its  make-up,  with  the 
programmer  essentially  forming  an  overview  of  how  the 
program  might  work.  Note  that  these  are  classes  and  not 
specific  elements. 

As  mere  information  about  the  program  is  processed, 
these  ideas  are  refined  by  generating  other,  more  specific 
hypotheses  based  on  new,  mere  focused  expectations.  As 
mentioned,  a  hierarchy  would  begin  to  form,  each  level 
further  refining  the  expectations  used  to  generate  the 
hypotheses  above.  As  each  new  level  is  formed,  it 
incorporates  more  information  about  the  program.  The  result 
is  more  factual  information  in  support  of  these  hypotheses, 
and  less  supposition  based  on  previous  knowledge  cf  similar 
tasks.  This  is  not  to  say  that  knowledge  base  information 
is  replaced  by  that  newly  learned  atcut  the  task.  Father, 
facts  about  the  problem  are  used  to  verify,  whenever 


C.  HYPOTBESIS  FRCCESS 


The  third,  and  perhaps  rrost  powerful,  process  used  by 
experts  is  hypothesis  generation,  refinement,  and 
verification.  It  is  a  top-down  process  which  allows  for 
maximum  utilization  of  the  programmer's  knowledge  base,  the 
overall  depth  of  which  determines  the  effectiveness  of  the 
process.  It  involves  the  generation,  based  on  information 
in  the  knowledge  base,  and  subsequent  refinement  and 
verification  of  hypotheses  regarding  the  programmer's 
suppositions  about  how  the  code  was  designed  and  written. 
As  more  and  more  information  about  the  software  is 
processed,  a  hierarchy  cf  these  hypotheses  is  constructed. 

This  hierarchy  is  built  quasi  depth-first.  This  is 
because  a  programmer  has  a  tendency  to  focus  on  cne  area, 
forming  a  cascade  of  refinement  hypotheses  through  several 
levels  before  shifting  his  or  her  attention.  The  programmer 
does,  however,  remain  cognizant  of  the  other  areas. 
Therefore,  information  encountered  while  refining  the 
current  area  cf  interest  is  often  used  to  form  hypotheses 
relating  to  these  ether  areas  as  well. 

The  hierarchical  structure  can  be  thought  of  as  defining 
levels  cf  understanding.  The  greater  the  depth,  the  mere 
the  programmer  has  refined  his  or  her  understanding  of  the 
software.  By  building  this  hierarchy,  the  programmer  is 
creating  an  internal  representation  cf  the  program, 
independent  of  any  programming  language.  The  goal  or  ideal 
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eipected,  is  round  and  needs  to  be  included  in  the 
representation.  Obviously,  the  higher  up  the  tree  the 
change  takes  place,  the  greater  the  rerrory  reorganization 
necessary . 

Up  to  this  point,  the  programmer  has  teen  forming  the 
program  representation  using  a  top-down  approach.  However, 
there  are  times  when  a  bottom-up  inductive  approach  is  also 


necessary , 


Usually  this  approach  is  taken  when  a 


programmer's  knowledge  base,  regarding  the  task  domain,  is 
incomplete,  or  when  atypical  algorithms  are  used.  Here  is 
where  chunking  plays  a  major  role.  The  purpose  of  this  neit 
example  is  to  demonstrate  this  role,  and  not  to  describe,  in 
detail  the  Inductive  process. 

Suppose  the  programmer  is  confronted  with  a  module  or 
block  of  code  that  he  or  she  has  formed  nc  hypothesis  about 
at  a  specific  level.  Using  the  grade  averaging  example, 
assume  that  the  programmer  has  no  knowledge  cf  how  averages 
are  computed,  and  that  the  algorithm  used  is  unknown  to  him 
or  her.  The  programmer  now  tries  to  understand  the 
algorithm  by  inductively  reasoning  about  the  code  based  on 
his  or  her  knowledge  of  lower  level  functions  performed 
within  it. 

At  the  lowest  level,  this  is  accomplished  by  locking  at 
individual  lines  of  code  and  assigning  them  interpretations 
(Ref.  12].  However,  because  the  expert's  knowledge  base 
contains  information  about  constructs  and  their  uses, 


yv 


certain  cf  these  lines  are  recognized  as  cede  included  in 
the  performance  of  a  specific  function.  EBCCKS  cells  these 
'  Deacons ' . 

The  block  of  code  is  for  a  standard  averaging  routine: 

I  =  1 
sup  =  e 

WHILE  STUE_GRAEE  [I]  <  >  999  EO 

SUP  =  SUM  +  STUD  GRADE  1 1] 

1  =  1*1 

ENE_WB III 
AVERAGE  =  SUP  /  I 

The  programmer  analysing  this  code  recognizes  the  first  two 
lines  as  assignment  statements,  end  interprets  them 
individually.  He  or  she  now  looks  at  the  WHIIE  line  and 
recognizes  it  as  a  looping  construct  and  teacon  for  several 
furctional  uses.  The  next  assignment  statement  has  the 
assignment  variatle  on  both  sides  of  the  equal  sign,  and  so 
is  interpreted  as  changing  the  value  of  SUP  by  performing 
some  operation  on  it,  rather  than  simply  assigning  it  a 
value.  Cnee  the  value  added  is  recognized  as  an  indexed 
value,  the  programmer  chunks  the  loop.  He  or  she  has 
knowledge  base  information  which  shows  that  an  indexed 
variable  added  to  that  type  of  assignment  statement 
Indicates  an  array  summation  process.  So  these  four  lines 
are  chunked  as  "SUP  STUDENT  GRADES".  Also,  the  first  two 
lires  are  now  chunked  as  "VARIABLE  INITIALIZATION"  based  on 
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this  new  information .  The  last  line  is  interpreted  as  an 
assignment  statement  which  computes  the  grade  average  by 
dividing  the  sum  of  the  grades  by  the  number  of  grades 


summed . 

Ey  chunking,  the  programmer  has  taken  a  piece  of  code, 
which  could  be  considered  a  single  chunk  which  "COMPUTES 
GRADE  AVERAGES”,  and  formed  a  representation  through 
inductive  reasoning.  The  original  seven  lines  of  code  can 
new  be  interpreted  as: 

-  Initialize  variables 

-  Sum  grades 

-  Divide  sum  by  number  of  grades  summed 

This  representation  can  stay  in  short  term  memory  to  be  used 
for  the  present  task,  being  linked  to  the  representation  of 
the  rest  of  the  program  in  long  term  memory,  and/or  can  be 
used  to  learn  an  averaging  algorithm  which  could  then  be 
used  for  other  tasks  as  well.  And,  once  learned,  the 
representation  could  be  added  to  that  in  long  term  memory. 
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VI.  RECOMMENDATIONS 


This  study  has  presented  a  theoretical  rrodel  of  simple 
cognitive  processes  developed  and  used  by  programmers . 
Further,  the  study  has  attempted  to  demonstrate  how  the 
expert,  by  using  these  processes,  gains  an  in-depth 
understanding  of  complex  programs.  It  is  unrealistic,  at 
present,  to  ully  test  these  ideas  because  methodologies 
have  net  been  developed  in  the  behavioral  sciences  to  do 
this.  Also,  the  requisite  size  and  complexity  of  the 
programs,  and  the  time  involved,  are  prohibitive.  Research 
and  the  results  of  limited  testing  on  small  scale  programs, 
however,  do  suggest  certain  design  techniques,  and  coding 
and  documentation  methods  which  directly  influence  the 
effectiveness  of  these  processes. 

One  such  area  is  code  structure,  which  should  be 
designed  so  as  to  suggest  chunks  to  anyone  attempting  to 
comprehend  it  [Ref.  13:  pg.  175],  Functional  elements  of 
the  code  should  be  implemented  as  contiguous  blocks  of  text 
whenever  possible.  Arbitrary  GCTC's  and  forward  and 
backward  JUMPS  should  be  avoided.  Control  flow  statements 
should  be  used  to  direct  flow  from  the  exit  point  of  one 
chunk  to  the  entry  point  of  others.  All  these 
considerations  enhance  the  chunking  process  by  making  blocks 


of  code  recognizable  as  single  functions.  This  results  in 
making  it  easier  to  use  the  text  oi  tbe  program  as  an 
external  memory  fcr  these  chunks. 

Tests  conducted  by  WIISIR  also  indicated  that  code 
structure  influences  slicing  [Ref.  IE].  It  was  found  that  a 
much  higher  degree  of  slicing,  among  21  expert  programmers, 
took  place  when  analysing  a  poorly  structured  program  with 
indiscriminate  use  of  GOTG's  and  ncn-mnemcnic  variable  names 
than  when  analysing  programs  which  make  use  of  modular 
designs,  mnemonics,  and  comments.  Tbe  value  of  proper  use 
of  mnemonics  and  comments  to  tbe  slicing  process  is  that 
they  serve  tc  explicitly  show  data  flow  and  to  group 
associated  statements  and  functions.  This  lessens  tbe  need 
for  programmers  to  ferret  out  this  information.  One  can 
conclude  that  less  effort  was  required  to  achieve  an  equal 
level  of  understanding  when  good  programming  techniques  were 
employed.  Tbe  use  of  these  maximizes  the  effectiveness  of 
slicing  while  minimizing  the  effort  necessary. 

Comments  and  mnemonics  are  also  helpful  to  the  chunking 
process.  A  well  placed  comment,  specifying  the  purpose  of  a 
block  of  code,  and  perhaps  the  data  elements  affected, 
explicitly  identifies  a  functional  chunk.  This  chunk  could 
tben  easily  be  encoded  based  on  the  comment  alone. 


eliminating  the  need  for  code  analysis  at  that  point. 
Meaningful  mnemonics  would  give  seme  insight  into  their 
purpose  and  thus  both  aid  the  recognition  and  chunking  of 


complex  data  structures  and  help  to  form  correct 
hypotheses.  These  could  then  be  Incorporated  Into  still 
larger  chunks,  allowing  the  many  data  elements  which  rrake  up 
the  structure  to  be  processed  as  a  single  element  in  memory. 

Program  docurrentation  can  be,  itself,  a  wealth  of 
information  for  the  expert  programmer.  A  natural  language 
explanation  of  the  approach  taken  in  originally  designing 
the  software  facilitates  the  formulation  of  a  fairly 
accurate  hypothesis  regarding  its  implementation.  Citing 
explicitly  the  algorithms  employed  enables  verification  of 
certain  hypotheses  without  extensive  code  analysis.  Using 
this  information,  the  maintainer  can  more  easily  focus  on 
certain  functions  or  behaviors  of  the  code  without  having  to 
first  analyse  It  in  depth  to  determine  the  specifics  of  its 
implementation.  If  exceptions  to  standard  algorithmic 
coding  are  noted,  it  saves  the  programmer  from  having  to 
determine  why  it  was  coded  in  such  a  way.  Also,  if  subtle 
effects  of  the  code  are  included  in  the  documentation,  along 
with  certain  potentials  for  side  effects,  it  would  reduce 
the  testing  necessary  when  a  modification  is  made. 

One  final  area  which  positively  affects  the  use  of  these 
processes  is  standardization  on  all  levels.  Use  of  a 
standard  design  methodology  would  allow  programmers  to  learn 
how  to  best  chunk  and  slice  certain  representative  software 
formats.  'Beacons'  identifying  certain  functional  areas 
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could  be  learned  and  used  effectively.  Automatic  tools  to 
aid  these  processes  could  also  be  developed  with  less 
difficulty. 

On  a  frcre  specific  level,  standardization  cf  algorithms, 
and  their  corresponding  constructs  would  greatly  simplify 
the  lash  of  comprehension.  Experts  would  te  able  to 
incorporate  these  into  their  knowldege  bases,  learning  them 
from  both  the  functional  and  the  behavioral  points  of  view. 
Also,  coding  templates  could  be  learned  and  associated  with 
these,  aiding  recognition  of  code  Itself. 

Similar  ideas  have  teen  used  in  most  other  engineering 
fields  with  great  success.  While  software  engineering  is 
tot,  in  many  respects,  as  rigorous  as  these  other 
disciplines,  standards  could  be  mace  flexible  enough  so  as 
not  to  inhibit  progress.  Software  reuseatillty  is  the 
motivation  for  recently  generated  interest  in  this  area. 
The  programming  language  ADA  is  the  first  step  in  an  attempt 
at  achieving  some  of  this  standardization,  and  its  use  in 
conjunction  with  these  processes  may  serve  to  verify  their 
va 1 id ity  . 
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